NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A text-guided protein design framework

https://doi.org/10.1038/s42256-025-01011-z

Liu, Shengchao; Li, Yanjing; Li, Zhuoxinran; Gitter, Anthony; Zhu, Yutao; Lu, Jiarui; Xu, Zhao; Nie, Weili; Ramanathan, Arvind; Xiao, Chaowei; et al (March 2025, Nature Machine Intelligence)

Current AI-assisted protein design utilizes mainly protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in text format describing proteins’ high-level functionalities, yet whether the incorporation of such text data can help in protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multimodal framework that leverages textual descriptions for protein design. ProteinDT consists of three consecutive steps: ProteinCLAP, which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441,000 text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.
more » « less
Free, publicly-accessible full text available March 27, 2026
Equivariant graph neural operator for modeling 3d dynamics

Xu, Minkai; Han, Jiaqi; Lou, Aaron; Kossaifi, Jean; Ramanathan, Arvind; Azizzadenesheli, Kamyar; Leskovec, Jure; Ermon, Stefano; Anandkumar, Anima (May 2024, International Conference on Machine Learning)

Full Text Available
Linking the Dynamic PicoProbe Analytical Electron-Optical Beam Line /Microscope to Supercomputers

Brace, Alexander; Vescovi, Rafael; Chard, Ryan; Saint, Nickolaus D; Ramanathan, Arvind; Zaluzec, Nestor J; Foster, Ian (September 2023, arXiv:2308.13701)

Full Text Available
The 4th International Workshop on Epidemiology meets Data Mining and Knowledge Discovery (epiDAMIK 4.0 @ KDD2021)

https://doi.org/10.1145/3447548.3469475

Adhikari, Bijaya; Srivastava, Ajitesh; Pei, Sen; Kefayati, Sarah; Yu, Rose; Yadav, Amulya; Rodríguez, Alexander; Ramanathan, Arvind; Vullikanti, Anil; Prakash, B. Aditya (August 2021, KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining)

The 4th epiDAMIK@SIGKDD workshop is a forum to discuss new insights into how data mining can play a bigger role in epidemiology and public health research. While the integration of data science methods into epidemiology has significant potential, it remains under studied. We aim to raise the profile of this emerging research area of data-driven and computational epidemiology, and create a venue for presenting state-of-the-art and in-progress results-in particular, results that would otherwise be difficult to present at a major data mining conference, including lessons learnt in the 'trenches'. The current COVID-19 pandemic has only showcased the urgency and importance of this area. Our target audience consists of data mining and machine learning researchers from both academia and industry who are interested in epidemiological and public-health applications of their work, and practitioners from the areas of mathematical epidemiology and public health.
more » « less
Full Text Available
Data-driven efficient network and surveillance-based immunization

https://doi.org/10.1007/s10115-018-01326-x

Zhang, Yao; Ramanathan, Arvind; Vullikanti, Anil; Pullum, Laura; Prakash, B. Aditya (January 2019, Knowledge and Information Systems)

Full Text Available
AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics

https://doi.org/10.1177/10943420211006452

Casalino, Lorenzo; Dommer, Abigail C; Gaieb, Zied; Barros, Emilia P; Sztain, Terra; Ahn, Surl-Hee; Trifan, Anda; Brace, Alexander; Bogetti, Anthony T; Clyde, Austin; et al (September 2021, The International Journal of High Performance Computing Applications)

We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including within a complete SARS-CoV-2 viral envelope simulation, which contains 305 million atoms and shows strong scaling on ORNL Summit using NAMD. We present several novel scientific discoveries, including the elucidation of the spike’s full glycan shield, the role of spike glycans in modulating the infectivity of the virus, and the characterization of the flexible interactions between the spike and the human ACE2 receptor. We also demonstrate how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems.
more » « less
Full Text Available

Search for: All records